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Abstract 

Understanding the forces that shape patterns of genetic variation across the genome is a major aim in evolutionary 
genetics. An emerging insight from analyses of genome-wide polymorphism and divergence data is that selection 
on linked sites can have an important impact on neutral genetic variation. However, in contrast to Drosophila, 
which exhibits a signature of recurrent hitchhiking, many plant genomes studied so far seem to mainly be affected 
by background selection. Moreover, many plants do not exhibit classic signatures of linked selection, such as a cor- 
relation between recombination rate and neutral diversity. In this review, I discuss the impact of genome architecture 
and mating system on the expected signature of linked selection in plants and review empirical evidence for linked 
selection, with a focus on plant model systems. Finally, I discuss the implications of linked selection for inference of 
demographic history in plants. 
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INTRODUCTION 

Understanding the forces that shape genetic variation 
is of great general as well as applied interest. As a 
result of recent massive advances in sequencing tech- 
nologies, we now have access to an unprecedented 
amount of genomic data (e.g. [1—5]). However, des- 
pite increasing data availability, many challenges 
remain when it comes to understanding what evo- 
lutionary forces dominate in shaping patterns of 
polymorphism across genomes. 

Since the seminal work of Begun and Aquadro [6] 
it has been recognized that the interaction between 
selection and recombination, or linked selection, can 
have a profound impact on levels of genetic variation 
across the genome. This is true for different forms of 
selection: under a hitchhiking model, the increase in 
frequency of a beneficial mutation results in a local 
reduction of genetic variation as linked neutral vari- 
ants are swept to fixation along with the beneficial 
mutation [7, 8] (Table 1). Under a background se- 
lection model, the continued removal of deleterious 
alleles by purifying selection also results in locally 



reduced variation at linked sites [9, 10] (Table 1). 
Finally, interference between linked selected variants 
reduces the efficacy of selection (Hill— Robertson 
interference; [11]). A common feature of most 
forms of linked selection is that they are expected 
to result in a characteristic signature of reduced levels 
of neutral variation in low-recombination regions of 
the genome that are more effected by selection at 
linked sites. 

An emerging insight from analyses of genome- 
wide polymorphism and divergence data is that ef- 
fects of linked selection may be much more pervasive 
than previously thought [12—14]. Indeed, it has been 
suggested that in some organisms, such as e.g. 
Drosophila simulans, most neutral sites in the genome 
have been affected by linked selection in the form of 
recurrent hitchhiking [13] (Box 1). However, while 
the role of linked selection is well established for 
Drosophila, the case is less clear when it comes to 
plants, which in many cases have not exhibited clas- 
sical signatures of linked selection, such as a correl- 
ation between recombination rate and level of 
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Table I: Glossary of linked selection 



Term 


Explanation 


Linked selection 


When positive or purifying selection 




affects linked genetic variation. 


Selective sweep 


When positive selection on a beneficial 




allele leads to a rapid increase in its frequency. 




This process generally leads to reduced 




polymorphism at linked sites. 


Background 


When purifying selection on deleterious alleles 


selection 


leads to reduced diversity at linked sites. 



neutral diversity [15—17]. Here, I will discuss how 
plant genome architecture and mating system affect 
the signature of linked selection, review empirical 
evidence for linked selection, with a focus on plant 
model systems, and discuss the implications of linked 
selection for the estimation of demographic history 
in plants. 



THE IMPACT OF LINKED 
SELECTION ON PLANT GENOMIC 
VARIATION 

Models of linked selection predict a positive correl- 
ation between recombination rates and levels of neu- 
tral diversity; however, this prediction only holds if 
the rate and intensity of selection are uniform across 
the genome. If the density of selected sites varies, a 
more general expectation is that neutral diversity will 
depend on the density of selected sites per recombi- 
national map unit [10, 18, 19]. Thus, in contrast to 
neutral models, the specifics of genome architecture 
(e.g. density of genes and other functional elements, 



recombination rate variation, chromosome number 
and length) are important for the distribution of neu- 
tral variation under linked selection models. 

A positive association between recombination rate 
and neutral polymorphism was observed early on in 
several plant species, including sea beets [20], Aegilops 
[21] and tomatoes [22]. However, the strength of the 
association between recombination rate and diversity 
was often considerably weaker than that seen in 
Drosophila (e.g. [23]). Moreover, in Arabidopsis thati- 
ana, early studies found no correlation between re- 
combination rate and neutral diversity [15, 24]. 
Likewise, an early study in Arabidopsis lyrata found 
no general reduction of non-coding polymorphism 
in low-recombination regions close to centromeres 
[16]. However, when recombination rates and gene 
densities are correlated, as they are in Arabidopsis [25], 
linked selection can result in a negative correlation 
between functional density and neutral diversity 
rather than the typical pattern of reduced diversity 
in low-recombination regions. Such a negative cor- 
relation between gene density and neutral diversity 
has indeed been observed in A. thaliana [15, 26]. A 
similar pattern was recently observed in Oryza rufipo- 
gon, where it was shown to be consistent with the 
action of background selection [17]. In the model 
legume Medicago trumatula, there is both a positive 
correlation between recombination rate and silent 
diversity, and a negative correlation between gene 
density and silent diversity [27]. In contrast, a nega- 
tive correlation between gene density and neutral 
diversity has so far not been observed in A. lyrata 
[16], although this remains to be revisited using 
genome-wide data, now that the A. lyrata genome 
sequence is available [28]. 

When can we expect linked selection to result in 
a negative correlation between gene density and 
neutral diversity? This will depend on details of 
genome architecture, as well as on plant life history- 
traits. In plants, a potentially important factor under- 
lying variation in linked selection is variation in the 
mating system. Because self-fertilization (selfmg) re- 
sults in a lower degree of effective recombination, 
the extent of linkage disequilibrium is expected to be 
longer in selfers (Box 2), and linked selection is 
therefore expected to have an impact over larger 
genomic distances in highly, but not exclusively, self- 
ing species [29]. However, aspects of genome archi- 
tecture, such as correlations between the density of 
sites under selection and the recombination rate can 
obscure the signature of linked selection. 



Box I: Types of selective sweeps 

'Soft sweeps' occur when positive selection acts to increase 
the frequency of several equally beneficial alleles on different 
genetic backgrounds, in contrast to 'hard sweeps', which in- 
volve selective fixation of a new beneficial mutation. 'Partial 
sweeps' occur when selection acts on multiple loci that are 
involved in adaptation, but does not necessarily lead to fix- 
ation of beneficial alleles at any of them. Finally, 'recurrent 
hitchhiking' occurs when selective sweeps happen repeatedly 
over evolutionary time. Hard sweeps and recurrent hitchhik- 
ing often lead to a distinguishable signature of elevated di- 
vergence at sites under selection coupled with reduced silent 
diversity and skewed allele frequency distributions in the 
vicinity of those sites. In contrast, the signature of partial 
and soft sweeps can be considerably more difficult to detect. 
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Box 2: Effects of mating system on effective recombi- 
nation rates and linkage disequilibrium 

'Linkage disequilibrium' is defined as the non-random asso- 
ciation of alleles among loci. Recombination breaks down 
allelic associations in double heterozygotes. Because self-fer- 
tilization reduces heterozygosity, recombination is less effi- 
cient at breaking up allelic associations in self-fertilizing 
species. Linkage disequilibrium can therefore be more exten- 
sive in self-fertilizers. 



To investigate when a signature of linked selec- 
tion would be expected to be evident in species 
with a genome architecture similar to those of A. 
thatiana and A. lyrata, I modelled the expected re- 
duction of neutral diversity under the background 
selection model of Hudson and Kaplan [10], clo- 
sely following the approach used in Rockman 
et al. [30] and Flowers et al. [17]. Briefly, the 
method of Hudson and Kaplan ([10]; Equation 
(15)) allows one to estimate the expected reduc- 
tion of neutral diversity in discrete intervals across 
a chromosome, with the effect of background se- 
lection on neutral diversity expressed as a function 
of the recombination rate and the proportion of 
sites subject to deleterious mutations in linked 
intervals. The modification of Rockman et al. 
[30] incorporates the effects of partial selfmg or 
other deviations from panmixia on effective re- 
combination rates through a scaling factor, the 
index of panmixia (P). Calculations were per- 
formed over a grid of values of P and a combined 
parameter incorporating both the intensity of se- 
lection and the dominance coefficient (sh). The 
dominance coefficient is included because domin- 
ant mutations are expected to be more efficiently 
selected against than recessive alleles, especially 
when rare. The considered values of P were 
equally spaced on a log scale between 0.002 and 
1, and those for sh were equally spaced on a log 
scale between 5 x 10 5 and 0.5. 

Figures 1 and 2 show the expected effect of back- 
ground selection on neutral diversity in A. thaliana 
and A. lyrata, respectively, over a range of outcrossing 
rates and varying strengths of selection, and with 
estimates of recombination rates and densities of se- 
lected sites based on empirical data. In these figures, 
the diploid genome-wide deleterious mutation rate 
U is assumed to be 0.33 and 0.32 for A. thaliana and 
A. lyrata, respectively. These values for U are based 
on estimates of on the proportion of sites under 



constraint of 0.177 for A. thaliana and 0.113 for A. 
lyrata [31], a genome size of 135 Mb for A, thaliana 
and 206.7 Mb for A. lyrata, and a mutation rate of 
7.0 x 10~ J for both species [32] . Recombination rate 
estimates for A. thaliana are based on the P66 cross in 
Salome et al. [33] and for A. lyrata they are based on 
Aalto et al. [34]. The proportion of sites subject to 
deleterious mutation in each interval is based on 
conserved regions identified in Haudry et al. [31]. 

In both species, the genomic impact of back- 
ground selection is expected to result in a negative 
correlation between gene density and neutral diver- 
sity, as long as there is not complete selfmg (and/or 
other strong deviations from panmixia) and as long as 
purifying selection is not very strong (Figures 1 and 
2). Furthermore, the model predicts higher levels of 
neutral diversity in pericentromeric regions, which 
harbor a lower density of sites subject to deleterious 
mutations. This is not strongly dependent on as- 
sumptions regarding U, as results are qualitatively 
similar for values of U of 0.15 and 0.60 
(Supplementary Figures SI— S4). 

In A. thaliana and A. lyrata, analyses of the distri- 
bution of fitness effects for non-synonymous muta- 
tions suggest that weak purifying selection is the 
predominant mode of selection and there is little 
evidence for high rates of adaptive non-synonymous 
fixations [35—37]. Furthermore, outcrossing rates 
have been estimated to be ~3% in A. thaliana [38]. 
Under this level of outcrossing and with weak selec- 
tion, a clear signature of background selection is a 
negative relationship between gene density and neu- 
tral diversity. This pattern has indeed been observed 
in A. thaliana [15]. Similar results have also been ob- 
tained for M. truncatula [39], suggesting that the nega- 
tive correlation between gene density and silent 
diversity in this species [27] could also be explained 
by background selection. The model also predicts 
that the signature of background selection should 
be weaker in outcrossers, such as A. lyrata, in line 
with empirical observations [16]. However, elevated 
neutral diversity in low-recombination pericentro- 
meric regions, which has been observed in A. lyrata 
[16], is consistent with the action of weak back- 
ground selection (Figure 2). Qualitatively, therefore, 
population genetic patterns in A. thaliana and 
A. lyrata seem to fit a simple model of background 
selection quite well. 

The patterns observed in many plant species so 
far seem to be consistent with a major role for 
weak purifying selection in shaping patterns of 
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Figure I: Expected impact of background selection on neutral diversity in A. thaliana. (A) The predicted reduction 
in neutral diversity (ratio of neutral diversity with versus without background selection) is plotted over a grid of 
two parameters which measure the strength of selection (sh, a combined parameter incorporating the selection in- 
tensity and the dominance coefficient) and the deviation from panmixia (P). Dots indicate the parameter combin- 
ations plotted in panels (B) and (C). The three different values of P correspond to outcrossing rates of 0.06%, 3.9% 
and 29.9%, assuming all deviation from panmixia is a result of self-fertilization, and the sh values are 5 x I0 _s , 
3 x I0~ 3 and O.I. (B) Relative proportions of neutral diversity across A. thaliana chromosome I for the nine parameter 
combinations indicated in A. Grey boxes mark the centromeric region on chromosome I. (C) Conditions under 
which background selection is expected to lead to a negative correlation between gene density and neutral diversity. 
The predicted reduction of neutral diversity under background selection is shown for four quartiies of gene density, 
ranging from those with the lowest gene density (QI) to those with the highest gene density (Q4). 



polymorphism, with linked selection mainly having 
an impact on the genomic distribution of neutral 
variation in selfmg taxa. It is currently not clear 
why plants should be experiencing less recurrent 
hitchhiking than for instance Drosophila, Possible 
explanations include factors that reduce the efficacy 
of natural selection relative to drift, such as low 
effective population sizes, strong population struc- 
ture^, 40], or effects of mating system on adap- 
tation [41, 42]. These effects have recently been 
reviewed thoroughly [43, 44] and hence will not 
be covered in more detail here. However, it 
should be noted that evidence for recurrent hitch- 
hiking has recently been found in a plant species; 
the outcrossing species Capsella grcmdiflora [45]. This 
species has relatively weak population structure, a 



large effective population size and low levels of 
linkage disequilibrium, factors that are expected 
to render natural selection more effective [36]. 
To further elucidate when we can expect to ob- 
serve recurrent hitchhiking in plants, genomic data 
for more species with a range of outcrossing rates 
and effective population sizes are required. 

Another explanation for the relative dearth of 
evidence for recurrent hitchhiking in plants could 
be that forms of adaptation that do not necessarily 
lead to a signature of species-wide sweeps are 
common in plants. Local adaptation is one such 
form of selection that is not expected to lead to 
species-wide sweep signatures. There is increasing 
evidence for local adaptation in A. thaliana, both 
from reciprocal transplant studies [46] and from 



272 



Slotte 
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Figure 2: Expected impact of background selection on neutral diversity in A. lyrata. (A) Predicted reduction in 
neutral diversity over the same grid of the index of panmixia and the compound selection parameter as in Figure 
IA. (B) Relative proportions of neutral diversity across A. lyrata chromosome I for the nine parameter combinations 
indicated in A. Grey boxes mark the approximate location of the centromere. (C) Conditions under which back- 
ground selection should lead to a negative correlation between gene density and neutral diversity. Expected reduc- 
tions in neutral diversity under background selection are shown separately for four gene density quartiles labelled 
Ql to Q4, ranging from lowest to highest gene density. 



studies that have quantified fitness components in 
mapping populations grown in the field [47]. 
Evidence for local adaptation has also been found 
using genomic approaches that search for correl- 
ations between allele frequencies and environmen- 
tal variables (e.g. [48, 49]) and by combining 
genomic analyses with common garden experi- 
ments [50] to identify locally adaptive alleles. 
However, the extent to which local adaptation 
affects patterns of polymorphism genome-wide is 
still an open question in A. thaliana as well as in 
most other plant species. 

While recurrent hitchhiking may be rare in 
plants, this does not preclude an important role 
for selective sweeps in plant adaptation. For in- 
stance, a recent study that analysed genome se- 
quences from 180 lines of A. thaliana from 
Sweden found many signatures of selective 
sweeps, including a massive sweep on chromosome 
1 involving a 700-kb transposition [51]. There is 



also evidence for partial selective sweeps in A, 
thaliana [52], but the general importance of differ- 
ent forms of sweeps, such as partial and soft sweeps 
(Box 1) remains unclear. 

Linked selection is also expected to have a major 
impact on levels of population differentiation. If 
there is diversifying selection, with positive selection 
driving alleles to high frequencies in some but not all 
populations under study, increased differentiation is 
expected at loci under selection, as well as at closely 
linked loci [53]. Similarly, background selection can 
lead to elevated F ST , particularly in regions of low 
recombination, because it decreases the effective 
population size experienced by linked loci [54], al- 
though the locus under selection itself is expected to 
exhibit reduced F ST . Thus, both forms of selection 
are expected to result in a negative correlation 
between recombination rate and F sx [55]. 
Distinguishing between these hypotheses requires 
examining additional measures of differentiation 
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that do not rely on within-population diversity; 
under background selection no elevation of absolute 
divergence is expected [53, 54]. This contrast was 
recently used to demonstrate that elevated values of 
-Fst on Silene latifolia Y-chromosomes are not a result 
of local adaptation, but instead caused by other pro- 
cesses reducing variability on Y-chromosomes [56]. 
Many studies have conducted genome scans in plants 
with the purpose of identifying candidate loci for 
local adaptation (reviewed in Strasburg et al. [57]), 
but these do not usually examine correlations be- 
tween population differentiation and recombination 
rates, perhaps because estimates of recombination 
rates have previously not been available for many 
non-model species. With the recent rapid advances 
in genome sequencing and genotyping methods, this 
area seems ripe for further investigation. 



EFFECTS OF PLANT GENOME SIZE 
VARIATION ON LINKED 
SELECTION 

Plants vary over 1000-fold in genome size, due to 
polyploidy and variation in the content of repetitive 
elements [58]. This might have consequences for the 
impact of linked selection. In background selection 
models, a key parameter is the genome-wide dele- 
terious mutation rate, U, a rough estimate of which 
can be obtained as a product of the mutation rate, 
the genome size, and the proportion of sites under 
constraint [59]. If plant genome-size variation affects 
U, it should also affect the impact of background 
selection. For instance, if polyploidization leads to 
relaxed selection on duplicate genes genome-wide, 
as theory predicts [60] , background selection may be 
relaxed in polyploid genomes. Studies of paleopoly- 
ploid genomes, such as that of A. thaliana, suggest 
that duplicate gene loss is indeed the most frequent 
outcome of whole-genome duplication [61]. On the 
other hand, duplicate genes that are retained experi- 
ence elevated levels of purifying selection [62]. 
Genome size increases due to expansion of repetitive 
elements may also be associated with reduced re- 
combination rates, as the rate of crossovers is gener- 
ally reduced in heterochromatic regions [58, 63]. 
Exploring the effects on linked selection when 
there are concomitant changes in genome size, re- 
combination rates and levels of constraint will thus 
be important for interpretation of broad comparative 



genomic studies of the effects of linked selection in 
plants. 

CONSEQUENCES OF LINKED 
SELECTION FOR DEMOGRAPHIC 
INFERENCE IN PLANTS 

If linked selection is pervasive, patterns of variation 
at neutral sites linked to selected sites may largely 
reflect the rate and strength of selection (i.e. gen- 
etic draft; [64]), rather than demographic history. 
In a recent simulation study using realistic param- 
eter estimates for human data, Messer and Petrov 
[65] demonstrated that linked selection can lead to 
significant skews in synonymous site frequency 
spectra, to the extent that demographic expansions 
were falsely inferred. The skew was exacerbated 
under higher levels of adaptive fixations, but was 
present at rates of adaptation as low as 0.1. The 
proportion of adaptive fixations at non-synonym- 
ous sites have been estimated to be higher than 
this in several plant species (e.g. Populus [66], C. 
grandiflora [36], and Helianthus [40, 67]). Indeed, a 
recent study of genomic patterns of variation in C. 
grandiflora found evidence for recurrent hitchhiking, 
as well as a skew towards rare alleles at synonym- 
ous sites [45] , consistent with the results of Messer 
and Petrov [65]. While the exact effects of linked 
selection should be examined using simulations 
with realistic settings for the study species in ques- 
tion as well as with empirical data, these results 
suggest that care should be taken when choosing 
what sites for use for demographic inference. In 
the human literature, studies are already starting 
to appear that take this consideration seriously 
and consequently analyse demographic history 
using non-coding regions far from sites under se- 
lection (e.g. [68]). Such approaches may be more 
difficult in plants, as plant non-coding regions can 
be difficult to align reliably due to their dynamic 
nature and often high content of repetitive elem- 
ents. However, with an improved understanding of 
the impact of linked selection in plant genomes, it 
is likely that considerations of the effects of linked 
selection will also become important for demo- 
graphic inference in plants. 

SUPPLEMENTARY DATA 

Supplementary data are available online at http:// 
bib . oxfordj ournals . org/ . 
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Key Points 

• Signatures of linked selection in plants often differ from those in 
animals, and taking account of variation in the density of selected 
sites is important. 

• There is accumulating evidence for an important role for back- 
ground selection in many plants studied so far, whereas evidence 
for recurrent hitchhiking is scarce. 

• More work on the effects of plant genome size variation on 
linked selection is needed. 

• Linked selection can have a marked impact on inference of 
demographic history, and this should be considered when choos- 
ing sites for demographic inference in plants. 
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